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Abstract 


This chapter provides an introductory overview of some of the basic experimental 
paradigms traditionally employed in the field of gesture studies to investigate both com- 
prehension and production in adult and child populations. With respect to gesture pro- 
duction, the chapter taps into paradigms used for exploring both intra-psychological 
and inter-psychological functions of co-speech gestures. At the same time, the present 
chapter aims to shed light on some of the core questions researchers have been addressing 
in using the described paradigms, concluding with a reflection on some of the methodolog- 
ical shortcomings and limitations of the respective paradigms and methods used. 


1. Introduction 


Co-speech gestures occur in all cultures (Kita 2009) and in a wide variety of conversa- 
tional contexts. This includes more formal settings, such as doctor-patient/therapist- 
client interaction (Duncan and Niederehe 1974; Heath 1989, 2002), teacher-pupil 
interaction (Roth 2001), work contexts (Mondada 2007) and official gatherings (Streeck 
1994), as well as more informal conversational contexts, for example in interactions 
with acquaintances, friends and family (Efron 1941; Goodwin 1986; Kendon 1980, 
1985; Miiller 2003; Seyfeddinipur 2004; Streeck 1994). The kinds of gestures used in 
these contexts and the functions they fulfil are manifold. Explorations of co-speech ges- 
tures occurring in natural contexts have been the origin of gesture studies and they 
remain a prime focus in the field of gesture studies. 


2. Can we capture co-speech gestural behaviour 
in experimental settings? 


While analyses of gestures “in the real world” yield important insights in their own 
right, they are also an important source of inspiration for experimental research on 
gesture. This chapter focuses on the latter, in particular the experimental techniques 
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and paradigms that have been employed to find answers to some of the central ques- 
tions in the field of co-speech gesture (mainly of a psychological nature). Before dis- 
cussing these in detail, it is important to emphasise that one fundamental assumption 
underlying this research is that the behaviour we elicit in the laboratory is represen- 
tative of what we observe outside of it. Of course, one possibility is that, in any given 
experiment, the chosen stimulus material influences the number and nature of ges- 
tures used; therefore results have to be considered in their particular context. Largely, 
however, the things participants are asked to talk about in gesture experiments tend 
to also feature frequently in everyday talk, including spatial relations, actions, objects 
and persons. One potentially critical issue which remains, though, is the common use 
of cartoon pictures or videos. The semantics of a cartoon world are radically different 
to the world we live in — literally anything can happen, even physical impossibilities. It 
is therefore possible that speakers use gestures differently in talk about more mun- 
dane events, especially if we consider that one use of gesture may be to channel 
and influence addressees’ inferences; these, of course, could be crucially different 
when trying to process talk about rather unpredictable cartoon worlds (cf. Holler 
2003; Holler and Beattie 2003a). However, because, to the best of my knowledge, 
no study to date has systematically investigated to what extent gestural behaviour 
in- and outside the lab are the same or different, we currently have no reason to dis- 
count experimental research on these grounds (but it is certainly an issue requiring 
future research). 

Further, participants in experimental settings are providing us with insight into 
spontaneously produced gestural behaviour (sometimes, bodily behaviour may be 
slightly inhibited initially since research ethics require us to inform participants 
when they are video-recorded, but warm up conversations tend to get around 
this problem). Moreover, we know that we observe at least some of the same phe- 
nomena in experimental and non-experimental gesture data. For example, imagistic 
gestural representations are common in everyday conversation (e.g., Kendon 1985), 
and they occur frequently in laboratory-based communication, too (e.g., McNeill 
1992); similar parallels can be claimed for interactive gestures (these involve the 
addressee in the interaction and are often associated with handing over a turn or 
keeping the floor) which have been observed both in the lab (Bavelas et al. 1995; 
Bavelas et al. 1992) as well as in everyday talk (Duncan and Niederehe 1974; Kendon 
2004; Streeck and Hartge 1992). Further examples include the so-called “return ges- 
ture” (de Fornel 1992) where one participant in a conversation repeating another’s 
gesture, which has also been observed in experimental contexts (Holler 2003; Holler 
and Wilkin 2011; Kimbara 2006, 2008; Parrill and Kimbara 2006). The parallels 
mentioned here are but a few and although no hard and fast evidence they serve 
to illustrate the point that gestural behaviour can be observed in experimental 
settings which, at least in some important aspects, is like that occurring outside the 
laboratory. 

Of course, all this is altogether less of an issue if we assume that co-speech ges- 
tures are largely independent from the interactive processes happening between the 
people talking. The basic requirement is here that the experimental tasks participants 
engage in appropriately model the cognitive demands encountered by participants in 
communication. This brings us to the questions gesture researchers have been trying 
to answer. 
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3. Some core questions in the field of co-speech gesture research 


Of course, the number of questions researchers interested in co-speech gestures have 
tackled is vast. One of the major debates which has dominated the field in recent 
years focuses on whether co-speech gestures are indeed an integral part of language, 
a notion put forward by Bavelas and Chovil (2000), Clark (1996), Kendon (1980, 
2000), and McNeill (1985, 1992), amongst others. One aspect of this debate focuses 
on the function(s) of co-speech gestures and the idea that they may not necessarily 
be communicatively intended, but, rather, benefit the speaker him or herself (such as 
through the facilitation of lexical access (e.g., Krauss, Chen, and Gottesman 2000; 
Rauscher, Krauss, and Chen 1996) or conceptual planning (e.g., Hostetter, Alibali, 
and Kita 2007; Kita and Davies 2009)). A more overarching question, then, is why 
we gesture when we speak — which, in addition to the discussion about inter- and intra- 
personal functions of gesture, also addressees the evolutionary roots and development 
of co-speech gestures and language (Corballis 2003; Kelly et al. 2002; Rizzolatti and 
Arbib 1998; Tomasello 2008). 


4. Experimental methods and paradigms 


The paradigms that have been employed to answer these questions experimentally are 
based on a wide range of methods and techniques. The following sections will provide a 
general overview of these (rather than in-depth discussions of individual paradigms) — 
however, due to limitations on space, this overview cannot be completely exhaustive 
in scope. 


4.1. Co-speech gesture comprehension 


How we comprehend and process co-speech gestures has been explored experimentally 
to a large extent using “play-back paradigms”. Here, the participant takes on the role of 
an observer, decoding the information they are presented with in the form of a video 
stimulus. Researchers have used this kind of paradigm primarily to test whether co- 
speech gestures communicate or not. To do so, they have often combined this basic par- 
adigm with many variations regarding the conditions under which the video clips are 
generated and presented. 

One common method is to recruit a first set of participants who describe a range of 
stimuli (for example landscapes, buildings and people (Krauss, Morrel-Samuels, and 
Colasante 1991) or cartoon stories (Beattie and Shovelton 1999a, 1999b, 2001)) to a 
confederate (usually the experimenter). These spontaneous narrations or descriptions 
have been shown to elicit a great amount of co-speech gestures. Video clips showing 
isolated gestures from this footage are then played to a new set of participants in a sec- 
ond stage of the study who view gesture and speech together, just the gesture (in 
absence of speech) or who hear the speech (in absence of gesture); this method allows 
researchers to evaluate the individual and combined contributions of the two modalities 
to the decoders’ message comprehension and information take-up. The measures these 
studies have used to identify whether the gestures have communicated information to 
the decoder-participants (both in the absence as well as over and above speech) vary. 
Krauss, Morrel-Samuels, and Colasante (1991) techniques required decoders, amongst 
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other things, to identify the lexical affiliates of gestures, the semantic interpretation and 
categorisation of gestures, and the recollection of individual gestures. Beattie and Sho- 
velton (1999a, 1999b, 2001) used what they called a “semantic feature approach”, which 
involved quizzing participants about the kinds of information they had received regard- 
ing a range of detailed semantic categories. This was done in various forms, using either 
open-ended or forced-choice questionnaires which participants completed for each clip. 
Feyereisen, van de Wiele, and Dubois (1988) used a similar method; however, they 
filmed people delivering lectures rather than describing imagistic stimuli and then 
played video clips of the gestures to decoders. Rather than measuring the amount 
the gestures communicated, their focus was on how well the decoders could differenti- 
ate gesture types (iconic and batonic). These judgements were made either with or 
without speech to see whether access to the verbal message content would modulate 
the perception (and communicativeness) of the gestures. 

Krauss et al. (1995) also used video footage of speakers communicating, but they in- 
cluded a condition in which participants exchanged information via an intercom (i.e., 
where addressees could not see the speakers’ gestures). They played these videos 
back to a set of decoders to compare the communicativeness of gestures which ap- 
peared to be produced for addressees and those which did not (measured in terms of 
the accuracy of decoders’ stimulus selection based on the speakers’ descriptions). 
Also, their study introduced a slightly different set of stimuli bearing more abstract 
features, such as synthesized sounds, tea flavours and abstract shapes. 

Studies by Rogers (1978) and Riseborough (1981), too, used the basic play-back 
paradigm to test the communicativeness of gestures, but by introducing conditions in 
which just the speaker’s face was visible or the face was blanked out they managed 
to filter out the contribution of facial information accompanying gestural representa- 
tions (thus contrasting with the studies above). Another variation is the presentation 
of noise at different levels of intensity to determine the importance of gestures when 
speech is more or less intelligible. A further important difference to the studies above 
is that the video footage played back to decoders stemmed from spontaneous interac- 
tions between two “naive” interlocutors, rather than from interactions involving a 
confederate (with the exception of Riseborough 1981, experiments 2 and 3). This is a 
crucial point, as speakers in these experiments may have produced more natural ges- 
tures than when talking to a confederate — I will come back to this issue in section 5. 
The measure employed by Rogers (1978) bears similarity to the semantic feature 
approach used by Beattie and Shovelton (1999a, 1999b, 2001), as individual questions 
(with multiple choice answers) tapped different semantic aspects of the actions and 
objects described by the participants in the stimulus videos (an approach based on 
Fillmore 1971). Riseborough’s (1981) measure, in contrast, was based on participants’ 
guesses about the objects the gestures represented, their recall of gestures, and the 
information they inserted into blank fields in a transcript of the original narrative. 

Apart from adult to adult communication, it has also been tested whether adults 
can glean information from children’s gestures, motivated by the idea that co-speech 
gestures can reveal something about children’s cognitive development (Alibali 
and Goldin-Meadow 1993; Church and Goldin-Meadow 1986; Broaders et al. 2007; 
Goldin-Meadow 2000, 2003). In particular, gestures can reveal whether children are 
at a so-called transitional stage (a period of time just before their implicit knowledge 
is about to advance by a significant step). Because children may benefit a great 
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deal from instruction and input from the environment during such periods, it is an 
important question whether adults (e.g., in the role of parents and teachers) are sensi- 
tive to this information in the child’s gestural communication. A large number of exper- 
imental studies has explored this issue, using a paradigm in which adults decode 
information from children’s gestures, extracted from videos of spontaneous interactions 
between the child and an experimenter (e.g., Alibali, Flevares, and Goldin-Meadow 
1997; Goldin-Meadow and Sandhofer 1999; Goldin-Meadow, Wein, and Chang 1992). 
During these interactions, children were asked to explain mathematical equations or 
traditional Piagetian conservation problems, which children tend to grasp only at cer- 
tain developmental stages. The adult decoders were presented with clips of either just 
the speech, the speech accompanied by a “matching” gesture (the gesture represents 
the same information as the speech) or a “mismatching” gesture (the gesture represents 
different, supplementary information to that contained in speech). They were then 
asked to check questionnaire answers relating to the video vignettes tapping the infor- 
mation the children had provided, or to talk about the children’s explanations, with 
their own speech and gestures subsequently being analysed for content to see what 
information the adults had picked up. 

Another question is whether children are also able to glean information from co- 
speech gestures. Kelly and Church (1997) employed a paradigm very similar to that 
used by Goldin-Meadow and colleagues (e.g., Goldin-Meadow, Wein, and Chang 
1992), adapted to test the gesture comprehension of 7 year olds. To do so, they employed 
three measures, a recall task (children describing, in their own words, the responses given 
by the children in the video vignettes), a questionnaire testing for the information the 
children thought had been given in the videos, and a task requiring children to assess 
whether they thought the children in the videos were just about ready to understand 
the concepts they were explaining. Other studies have directly compared the decoding 
abilities of children and adults using the same basic paradigm, combined with compre- 
hension and memory measures appropriate for the different age groups (Church, 
Kelly, and Lynch 2000; Kelly and Church 1998; Thompson and Massaro 1986). Also, 
some studies have started to investigate comprehension of gestures in very small children 
(around 1 year of age) which have mainly focused on the understanding of intentionality 
associated with gestures. These have used quite different paradigms. For example, Gliga 
and Csibra (2009) measured children’s looking times in response to objects appearing at 
locations indicated by pointing gestures or at the opposite side to that indicated by the 
gesture. 

Paradigms testing the communicativeness of co-speech gestures have been widely 
applied to children and healthy young adults; some studies have adapted these para- 
digms also to other populations, such as older adults and aphasics (for examples, see 
Cocks et al. 2009; Feyereisen, Seron, and de Macar 1981; Feyereisen and van der Linden 
1997; Thompson 1995). 

Apart from researching whether co-speech gestures communicate semantic infor- 
mation, studies have also focused on the comprehension of the pragmatic aspects of 
messages, in particular indirect requests. For example, Kelly et al. (1999) used a play- 
back paradigm to present observers with clips of an actor expressing indirect 
requests accompanied by gesture (or not), where the gesture provided additional infor- 
mation relevant to interpreting the speaker’s communicative intent. They asked parti- 
cipants to predict the response of the person who acted as addressee in the stimulus 


Bereitgestellt von | De Gruyter / TCS 
Angemeldet 
Heruntergeladen am | 16.10.19 15:52 


842 V. Methods 


video, thus measuring information uptake from the gestures and whether this infor- 
mation was integrated with the interpretation of the speaker’s intended meaning. 
Kelly (2001, experiment 1) investigated the role of gesture for pragmatic understand- 
ing in children (3-5 year olds) using a similar paradigm. Children watched video- 
recorded interactions in which one person uttered an indirect request accompanied 
by gesture or not, with children being asked what the speaker in the video had 
referred to. 

In addition to experiments presenting video clips to participants acting as observers/ 
decoders, some studies have tested the communicativeness of gestures in live interac- 
tions. Graham and Argyle (1975) asked individuals to describe abstract shapes (of 
high and low verbal encodability) to a group of addressees present in the same room. 
In one condition, describers were allowed to gesture freely, in the other they were 
asked to fold their arms. Addresses then drew the shapes, followed by an evaluation 
of the accuracy of their drawings in the two conditions to measure gestural communi- 
cation. Holler, Shovelton, and Beattie (2009) asked an actor to provide a scripted car- 
toon narrative (based on spontaneously produced narratives) to addressees, including 
the production of gestures which accompanied the original narratives. After the narra- 
tions, addressees answered questions about the stories which were then scored by the 
experimenters for the information they contained according to individual semantic fea- 
tures (some of which were only represented in the gestures). The communicativeness of 
the gestures in the face-to-face condition was compared to video (gesture + speech and 
gesture only), as well as to an audio only condition (speech without gesture). With 
regard to children’s gestures, Goldin-Meadow and Sandhofer (1999) have shown that 
adults can glean significant amounts of information from them when observing the chil- 
dren communicate live with an experimenter, using the same paradigm as with their 
video-based play-back conditions. Kelly (2001, experiment 2) tested the communicative 
role of gesture in children’s pragmatic understanding live by engaging them in interac- 
tion with the experimenter who uttered indirect requests (using just speech, gesture and 
speech, or just gestures to make the request, such as by pointing at an object). The chil- 
dren’s success at understanding was reflected in their response to the indirect requests. 
Behne, Carpenter, and Tomasello (2005) and Grafenhain et al. (2009) showed that ges- 
tures are communicative in a live context even to very small children (14 months of 
age). Their studies tested children’s interpretation of the communicative intent asso- 
ciated with gestures produced by an adult. For example, in the task used by Grafenhain 
et al. (2009), one adult pointed towards one of two locations combined with either 
averted gaze or gaze directed at another adult looking for a toy. Children who observed 
this scene were then allowed to look for the toy themselves, with their choice of location 
providing insight into their comprehension of gesture and gaze cues. 

Studies testing the communicativeness of gestures in a live, face-to-face context 
advance our knowledge of gesture considerably, as they eradicate some of the potential 
limitations of studies using video play-back techniques. For example, in case of the 
latter, video clips of individual gestures are presented to decoders often without the 
natural context in which they occur, thus isolating them from any other contextual 
cues, and in some studies the clips were even played repeatedly. However, video 
play-back paradigms do offer the advantage that gestures from spontaneous interac- 
tions can be used as the stimuli whereas in most of the studies using a face-to-face con- 
text reviewed here (with the exception of the studies by Graham and Argyle as well as 
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Goldin-Meadow and Sandhofer described above), confederates/experimenters produced 
the gestures; this issue will be addressed in more detail in section 5. 

Yet another alternative is to base the decision about whether the information from 
a gesture has been received and understood on the addressee’s behaviour — such as in 
response to gestures with an interactive function which are known to elicit certain 
addressee responses (cf. Bavelas et al. 1995), as well as in cases where participants 
mirror their interactants’ gestures (Holler 2003; Holler and Wilkin 2011; Kimbara 
2006, 2008; Parrill and Kimbara 2006). Whereas the amount of detailed insights 
we can glean from such paradigms (e.g., exactly how the gesture was perceived/ 
interpreted by the addressee, or exactly how much information was received), this 
sort of paradigm preserves most of the natural interaction in which co-speech gestures 
are used. 

Other paradigms measuring the communicativeness of co-speech gesture do not rely 
on the use of questionnaires or other pen and paper recordings of participants’ answers. 
Eye-tracking studies, for example, have investigated recipients’ overt attention to ges- 
tures by measuring the amount and duration of recipients’ eye fixations on speakers’ 
gestural movements presented on video or in live conditions (Gullberg 2003; Gullberg 
and Holmqvist 1999, 2006). Although these eye-tracking data can provide useful in- 
sights into when and for how long participants overtly attend to gesture, the tool is 
not sensitive enough to capture covert attention processes and is not suitable for mea- 
suring information uptake from gestures (neither amount nor type) as there appears to 
be no clear association with direct fixations (Gullberg and Kita 2009). Reaction time 
measurements, however, are a suitable method for tapping into more covert processes 
of gesture comprehension (e.g., Kelly, Özyürek, and Maris 2010). 

In addition to the behavioural measures reviewed above, at least two types of tech- 
niques from cognitive neuroscience have been used to measure the brain’s response to 
gestures. They are suitable for providing insight into information uptake from gestures, 
the relationship between gesture and speech and the way in which the brain processes 
information from the two modalities. More precisely, studies using ERP (Event Related 
Potentials; the measurement of the brain’s electrophysiological response as an indica- 
tion of its activity following stimulus presentation) are suitable to answer questions 
about the time course of the processing and integration of different signals (including 
semantic integration, typically captured by the N400 component). The first ERP studies 
explored the semantic integration of speech and gesture using matching and mismatch- 
ing gestures either as primes to subsequently presented words (Kelly, Kravitz, and 
Hopkins 2004), within a sentence context presented simultaneously with speech 
(Ozyiirek et al. 2007), and in association with imagistic information (cartoon images) 
and matching or mismatching words (Wu and Coulson 2007). ERP studies have also 
been used to investigate if and when the brain picks up information from co-speech ges- 
tures representing information that is not contained in the speech at all but semantically 
relevant for the interpretation of the verbal message (e.g., in the context of ambiguous 
speech, Holle and Gunter 2007). And, recently, study by Kelly, Creigh, and Bartolotti 
(2010) has used ERPs to gain insight into how voluntary or automatic gesture-speech 
integration is. 

fMRI techniques (Functional Magnetic Resonance Imaging; a technique used to 
measure the brain’s neural activity based on changes in blood flow, providing insight 
into brain area-specific activity in response to a stimulus or task) have been used to 
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find out where in the brain gesture and speech are integrated and which neural net- 
works are involved in their processing. For example, Willems, Ozyiirek, and Hagoort 
(2007) used a paradigm in which they varied the difficulty of gesture-speech integration 
(with matching versus mismatching gestures) to explore this issue. In another study, 
they compared co-speech gestures with those that are less strongly tied to speech (pan- 
tomimes) to see whether they activate different or overlapping brain areas (Willems, 
Ozyiirek, and Hagoort 2009). Other researchers have focused on gesture-speech inte- 
gration when gesture provides supplementary information which disambiguates speech 
(Holle et al. 2008), and for the involvement of the human mirror system in co-speech 
gesture processing (Skipper et al. 2007). Further, paradigms have manipulated the 
degree of perceived communicative intent associated with gestures to investigate prag- 
matic aspects of gesture-speech integration (e.g., by creating gesture-speech mismatches 
produced by the same versus different persons (Kelly et al. 2007), or by varying the 
speaker’s gaze direction (Holler, Kelly, Hagoort, and Ozyiirek 2012), or their body ori- 
entation as being oriented either towards the participant or towards a third person 
(Straube et al. 2010)). 

In both ERP and fMRI studies exploring co-speech gesture processing, it is often 
necessary to work with video stimuli of a highly controlled nature, as, otherwise, it is 
difficult to attribute observed effects to the intended experimental manipulation. The 
stimuli used in these studies tend to be video clips of individual gestures presented 
on their own, accompanied by speech (words or sentences), or preceded by it. Due 
to the strong need for careful control, the stimuli usually involve a trained actor carry- 
ing out scripted hand movements. (Also, ERP and fMRI studies often incorporate addi- 
tional tasks requiring participants to answer questions or make some other kind of 
decision (e.g., using a push-button device). This results in additional datasets of reaction 
times (RTs) and response accuracy, for example, which provide further insight into the 
comprehension of co-speech gestures.) 


4.2. Co-speech gesture production 


A paradigm that has become widely established is that first used by David McNeill 
(1985, 1992), involving the use of cartoon videos (famously, Sylvester and Tweety car- 
toons) watched by one participant who then tells it to another while being video recorded. 
This paradigm was originally used without any further experimental manipulations 
and McNeill (1985, 1992) used the footage, rich in spontaneously produced co-speech 
gestures, to analyse the semantic relationship between gesture and speech. These 
analyses provided the basis for him to make the important argument that thought is 
externalised by, and that language consists of, both speech and co-speech gestures, and 
to model the kind of mental representations underlying speakers’ gesture-speech 
utterances. 

Some later studies used the same basic paradigm. Holler and Beattie (2002, 2003a) 
used it to investigate and quantify the semantic interplay of gesture and speech using a 
fine-grained semantic feature analysis (albeit with static rather than moving cartoon 
images), as well as a host of studies exploring cross-linguistic differences, specifically, 
how speakers of different languages package information in gesture and speech when de- 
scribing the same stimuli (e.g., Allen et al. 2007; Kita and Özyürek 2003; McNeill 2001; 
McNeill and Duncan 2000; Özyürek et al. 2008; Özyürek et al. 2005). 
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The idea that co-speech gestures can provide us with a greater insight into speakers’ 
underlying mental representations has also become of great relevance in developmental 
psychology. Here, particularly the work by Susan Goldin-Meadow and colleagues (e.g., 
Alibali and Goldin-Meadow 1993; Church and Goldin-Meadow 1986; Broaders et al. 
2007; Goldin-Meadow, Alibali, and Church 1993; Goldin-Meadow 2003; Perry, Church, 
and Goldin-Meadow 1988) (see also Pine, Lufkin, and Messer 2004) has shown that 
children often externalise knowledge in co-speech gesture before they are ready to 
communicate verbally about the same concepts, such as in their explanations of conser- 
vation, maths and balance problems. In these kinds of studies, children are given prob- 
lems of the aforementioned kind and are simply asked to provide their explanation of 
it. Both gesture and speech can then be analysed for the semantic information they 
represent. 

Because co-speech gestures bear a very close relationship to speech, researchers 
have been intrigued by the nature of this relationship, the role of gesture in the process 
of speaking and communicating and the exact functions they fulfil in talk. Different 
experimental paradigms have been used to test different hypotheses; these can be 
broadly classed into those postulating cognitive functions (thus benefiting mainly the 
speaker) and those postulating communicative functions (thus benefiting primarily 
the addressee). However, although contrasted here and discussed separately, these 
approaches are not necessarily mutually exclusive. 

Goldin-Meadow et al. (2001) have argued that co-speech gestures may reduce a 
speaker’s cognitive load and thus free up cognitive capacities. In their paradigm, parti- 
cipants explained their solutions to a series of maths tasks, which they frequently accom- 
panied with gestures, while trying to remember a sequence of letters. This was combined 
with a standard memory test (tapping the letter sequences) to see whether those who 
gestured more would perform better, assuming that gesturing enabled participants to 
allocate more resources to the memory task. 

Other researchers have claimed that gestures maintain representations in spatial 
working memory, thus indirectly influencing speech production (Morsella and Krauss 
2004; Wesp et al. 2001). Both Morsella and Krauss (2004) and Wesp et al.’s (2001) para- 
digms required participants to describe stimulus objects either from memory (stimulus 
absent) or while looking at them (stimulus present). A similar procedure was used by 
de Ruiter (1998, experiment 3) to test the lexical retrieval theory against the theory 
that gestures facilitate the encoding of imagery in speech. 

Co-speech gestures have also been postulated to facilitate conceptual planning 
during the speech production process. To investigate this hypothesis, researchers 
have used paradigms which compare conditions under which conceptual planning is 
easy versus difficult. For example, Alibali, Kita, and Young (2000) used traditional 
Piagetian conservation tasks and asked children to either explain why they thought 
two vessels held the same or different amounts of liquid, or describe how the two ves- 
sels looked differently. Other studies have asked participants to describe a range of 
shapes made up from lines connecting a number of dots to another person; to create 
a conceptually more difficult condition, they removed the lines, leaving just dot pat- 
terns less suggestive of a particular shape (Hostetter, Alibali, and Kita 2007). Another 
study created conditions where participants had to describe geometrical shapes with 
or without distracting lines creating competing conceptualisations (Kita and Davies 
2009). Melinger and Kita (2007) increased conceptual planning load by asking 
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participants to complete a secondary, competing task (either similar or different to the 
primary task). 

Lexical access is another core component of the speech production process, happen- 
ing at a later stage than the conceptual planning of messages. Some researchers have 
argued that it is at this point that co-speech gestures fulfil a facilitating function (e.g., 
Krauss, Chen, and Gottesman 2000; Rauscher, Krauss, and Chen 1996). Two main para- 
digms have been used to test this particular theory. One involves preventing speakers 
from gesturing followed by a subsequent analysis of the effects on verbal encoding; re- 
searchers have used various methods to restrict speakers’ gestures, such as asking them 
to press desk-mounted hand switches (Lickiss and Wellens 1978), hold objects in their 
hands (Frick-Horbury and Guttentag 1998, experiment 1), or to keep their arms folded 
(Beattie and Coughlan 1999; Graham and Argyle 1975); however, this procedure may 
require the speaker to divide their attention by concentrating not only on the experi- 
mental task but also on the fact that they should not move their hands. Other studies 
have therefore used methods which meant that speakers were not physically able to 
move their hands, for example by fastening their forearms to arm rests (Rime et al. 
1984), by immobilising their hands in special apron pockets (Frick-Horbury and Gut- 
tentag 1998, experiment 2) or by placing electrodes on their palms while pretending 
that the experiment focused on the psychophysiological recordings made during the 
task (Rauscher, Krauss, and Chen 1996). 

A second paradigm used to investigate the function of gesture and test the theory of 
lexical access involves the elicitation of tip-of-the-tongue (ToT) states. For example, 
researchers have tried to get participants thinking about certain words by providing 
them with the dictionary definitions (and the first letter) of a range of words (Beattie 
and Coughlan 1999; Frick-Horbury and Guttentag 1998) or with pictures of objects 
when studying tip-of-the-tongue states in children (Pine, Bird, and Kirk 2007; Yan 
and Nicoladis 2009). The aim was to then analyse the frequency and type of gestures 
accompanying tip-of-the-tongue states, and to compare the number of tip-of-the-tongue 
states resolved with and without gesture. Some researchers have combined this para- 
digm with that of gesture prevention described above (e.g., Beattie and Coughlan 
1999; Frick-Horbury and Guttentag 1998). 

In addition to linking gesture use to cognitive, intra-personal functions, researchers 
have also developed paradigms to investigate their communicative, inter-personal 
ones. One well established paradigm involves the manipulation of visibility between 
the speaker and the addressee, usually by separating both participants, present in the 
same room, with an opaque screen (Alibali, Heath, and Myers 2001; Gullberg 2006). 
Cohen and Harrison (1973), who were amongst the first to experimentally investigate 
the communicative functions of gestures, used a combined manipulation of both visibil- 
ity and co-presence; instead of being separated by an opaque screen, participants were 
located in different rooms and communicated via an intercom (they compared this to a 
face-to-face condition). In 1977, Cohen published a follow-up study and introduced a 
third condition in which participants talked into a tape-recorder, thus removing the 
addressee completely (speakers were told they were just practicing the task). This al- 
lowed him to compare the influence of visibility and co-presence on gestures with the 
influence of a completely absent addressee. Mol et al. (2009) compared four different 
conditions, participants communicating face-to-face, separated by an opaque screen, 
and via a web cam (here the participant did not see the addressee but was told that 
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the addressee could see them via a video link); the fourth condition was set up exactly 
like the latter but participants were told that their communication (both audio and 
visual) would be fed into a computer. This manipulation allowed the authors to com- 
pare the previous three different communicative contexts as well as human-human 
and human-machine communication. 

Bavelas et al. (2008) introduced another important manipulation to tease apart 
the influence of visibility and dialogue on gesture use. In addition to visibility and co- 
presence, they varied dialogic interaction. This was done by comparing a face-to-face 
and a screen condition, in which interactants were free to engage in dialogue, with a 
tape recorder condition, in which participants believed they were recording their mes- 
sage for another person who would listen to the recording later — thus, neither did these 
speakers see their addressees, nor did they engage in dialogue with them. This work 
builds on a series of other studies manipulating monologue and dialogue and investigat- 
ing the role of the addressee’s involvement in gesture and language use (Bavelas et al. 
1995; Bavelas, Coates, and Johnson 2000; Bavelas et al. 1988; Bavelas et al. 1986). In 
addition to research on gesture use in dyadic interaction, manipulating visibility, co- 
presence and dialogic interaction, researchers have investigated the influence of 
another contextual factor on co-speech gestures — that of addressee location (Özyürek 
2002). Here, speakers talked to either one or two addressees who were located directly 
opposite or towards the side. Speakers’ use of gesture when representing spatial infor- 
mation was compared between these conditions to provide insight into recipient design 
in gesture use. Further, because this research involved multi-party interactions, it ex- 
pands our knowledge from gesture use in dyadic interactions to that in triads. 

Apart from the influence of the degree of interactivity and physical contextual fac- 
tors (such as co-presence, visibility, number of addressees and their location), re- 
searchers have investigated the influence of more cognitive, covert processes of 
conversation. One variable that has been manipulated in this context is the common 
ground between speakers and their addresses (i.e., the knowledge, beliefs and as- 
sumptions mutually shared by participants in an interaction (Clark 1996)). Common 
ground has been experimentally induced in a variety of ways. Gerwing and Bavelas 
(2004) asked participants to play with either the same (common ground) or a differ- 
ent (no common ground) set of toys and then asked one person to tell another about 
their experiences with the toys. The gestures used to refer to the toys in the two 
groups were compared for differences their form (precision). Apart from creating 
common ground based on shared action-based experiences, researchers have also 
used paradigms to induce it visually by presenting stimuli to the speaker and the 
addressee or just to the speaker (who then talks to an unknowing addressee). Holler 
and Stevens (2007) used images showing particularly large entities amongst smaller 
ones and focused their analysis on the effect of common ground on the encoding 
of size information in both gesture and speech. Holler and Wilkin (2009) used a 
similar method, but instead of pictures used a short video, allowing their analysis 
to focus on a wider range of semantic features (relating to actions, objects and 
persons, as well as their attributes). Parrill (2010) also used video stimuli to experi- 
mentally manipulate common ground, but instead of a longer video (telling a 
whole story) she used a short clip showing a single event and, similar to Holler and 
Stevens (2007), focused her analysis on one semantic aspect of it (here, the ground ele- 
ment). In addition, she combined this with the manipulation of “information salience”, 


Bereitgestellt von | De Gruyter / TCS 
Angemeldet 
Heruntergeladen am | 16.10.19 15:52 


848 V. Methods 


i.e., whether the ground element had been mentioned by an experimenter previously to 
the participant referring to it or not. Holler (2003) and Jacobs and Garnham (2007) ma- 
nipulated common ground by asking participants to relay the same description of events 
represented in cartoon pictures to the same addressee repeatedly (thus accumulating 
common ground) in order to then compare the speakers’ gesture rate across the trials. 
Jacobs and Garnham (2007) also used joint visual availability of the stimulus to induce 
common ground, by providing both speaker and addressee with the view of the stimulus 
while it was being described. 

Other studies investigating the link between communicative intent and co-speech 
gestures have manipulated verbal ambiguity to find out whether speakers would draw 
on the gestural modality to clarify their speech for the interlocutor (Holler and Beattie 
2003b). Further, Melinger and Levelt (2004) investigated whether co-speech gestures 
encode what they defined as “necessary” information, and whether in such cases speakers 
were less likely to also represent this information in speech. 

Although not manipulating communicative intent directly, studies focusing on ges- 
tural mimicry (Holler and Wilkin 2011; Kimbara 2006, 2008; Parrill and Kimbara 
2006) provide insight into the collaborative use of co-speech gestures and add further 
to our knowledge of communicative uses of co-speech gestures in communication. 

Finally, many of the paradigms reviewed above have been adapted (and in some 
cases special paradigms have been newly created) to investigate gesture production in 
populations other than children and the “healthy student adult”, such as in older adults 
(Feyereisen and Havard 1999), split-brain patients (Kita and Lausberg 2008; Lausberg 
and Kita 2002; Lausberg et al. 2003), aphasic patients (Cocks, Hird, and Kirsner 2007; 
Hadar et al. 1998) and Alzheimer’s patients (Carlomagno et al. 2005; Glosser, Wiley, 
and Barnoski 1998). 


5. Some methodological shortcomings and limitations 


One very debatable issue is the use of confederates in production (but also comprehen- 
sion) studies of gesture. The obvious reason is that the verbal and nonverbal behaviour 
of confederates may seem unnatural as it is non-spontaneous. Of course, confederates 
may be able to control some behaviours quite well, such as whether to ask a question at 
a certain point or not. But this may not be the case for others, especially quick, fleeting 
micro-behaviours which are under much less voluntary control (movements of the facial 
muscles, for example) and behaviours which it is difficult to produce consistently across 
experimental trials (e.g., the exact intonation with which we utter something). What en- 
hances the problem further is that in most cases the same confederate is “used” in more 
than one experimental trial. This means that when listening to descriptions or narratives 
of certain stimuli, latest on trial number 2 the confederate has pre-existing knowledge 
about what the speaker is telling (i.e., it is given information, without the speaker know- 
ing this) and hence may respond to it differently than when being provided with new 
information. This, in turn, may of course affect the participant’s behaviour. An addi- 
tional problem is that in many cases the experimenter takes on the role of the confed- 
erate. This means that they are familiar with the experimental manipulations and in 
most cases probably also with the exact hypotheses. Potentially, this can have a huge 
impact on the confederate’s behaviour which may be influenced by their particular ex- 
pectations. The participant’s verbal and/or gestural behaviour may, as a result, be biased 
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into a certain direction (e.g., the experimenter may, unconsciously, respond more enthu- 
siastically or encouragingly (verbally or nonverbally) in cases where the participant 
has displayed gestural behaviour in line with the experimental predictions (or sanction 
behaviour going against them, such as with a lack of positive feedback)). 

Of course, there may sometimes be good reasons as to why researchers want to use 
confederates in their studies. In comprehension studies, it is important to isolate a single 
manipulation or difference to test a particular hypothesis and obtain clear results. Espe- 
cially in ERP and MRI studies, a tightly controlled, carefully constructed stimulus (op- 
timising the signal/noise ratio) is necessary for the signal to pick up any meaningful and 
unequivocally interpretable responses from the brain. Other reasons are that confeder- 
ates producing scripted behaviour allow researchers to examine recipients’ responses to 
these behaviours — which may be useful when the natural occurrence of such behaviours 
is rather rare (meaning that an unmanageably huge number of hours of recordings and 
participants would be needed to obtain large enough a dataset), or when the social con- 
text in which it occurs creates too much noise for a clear analysis. Yet another reason of 
for using confederates, at least in production studies, is the availability of resources, in- 
cluding the size of the participant pool, financial means for compensating participants, 
and the greater effort and difficulty associated with recruiting unacquainted participants 
as pairs. Although regarding this latter reason scientific rigour and validity should cer- 
tainly weigh stronger, much of the research using confederates in production studies 
was carried out quite a few years ago, when the strong influences social-interactional 
contexts can have on gesture use was not all that well known. Researchers nowadays 
benefit from this awareness and where future studies need to employ confederates as 
addressees or stimuli-actors for the above named (or other) reasons, one way of redu- 
cing methodological limitations is to complement the analyses with a second, smaller 
dataset using spontaneous interactions between “naive” participants in the same respec- 
tive context. This helps to demonstrate that similar behaviour occurs in a more natural 
context. Another (or better, additional) option is to have the consistency and natural- 
ness of the confederate’s behaviour established by a separate set of independent 
observers. 

Another controversial issue is the manipulation of the interaction between speaker 
and addressee. In many of the production studies cited in this chapter, the participant 
taking on the role of the addressee was asked not to interrupt the speaker with ques- 
tions (while still delivering back-channel responses though). It appears that one of 
the reasons researchers choose to limit the amount of dialogic exchange is the possi- 
bility of experimental confounds. Studies often measure the influence of various cog- 
nitive and social variables on gesture by focusing on gesture frequency or gesture 
rate. However, we also know that verbal interaction itself (as compared to mono- 
logue) influences gestures rate, independent of any additional manipulation (Bavelas 
et al. 1995; Beattie and Aboudan 1994). Thus, when manipulating, for instance, common 
ground or conceptual load, participants may interact more with their addressee in one of 
the experimental conditions than in another (e.g., participants may feel more rapport 
with the other participant when mutually sharing certain knowledge, or they may seek 
more help or feedback from their addressee when finding communication conceptually 
more difficult). In such a case, a higher gesture rate in one of the conditions could be 
due to a difference in dialogic interaction per se as well as due to the experimental 
manipulation. 
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However, considering that other studies have shown that dialogical involvement of 
the addressee impacts crucially on gestural behaviour (Bavelas et al. 1995; Beattie 
and Aboudan 1994) studies restricting interaction may be fundamentally limited in 
the extent to which their findings can be generalised to dialogue. Because dialogue is 
one of the most common forms of everyday conversation this is a serious potential lim- 
itation. While researchers certainly need to be aware of this limitation (and take it into 
account when drawing their conclusions), those studies based on restricted interactions 
are certainly not without value. This is because everyday talk constitutes a continuum, 
ranging from monologue to dialogue (Pickering and Garrod 2004). People talk in 
monologue when delivering lectures, conference talks or other oral presentations, 
and in conversation, individual speakers often take extended turns to tell stories and 
anecdotes, jokes, describe how someone gets from A to B, what procedure to follow 
to achieve a certain goal, or about other complex contexts whose explanation stretches 
over several sentence. During such extended turns it is not rare that addressees provide 
mainly backchannel responses rather than take the floor. In other cases, some interlo- 
cutors may simply be more dominant, vocal or extrovert and therefore talk consider- 
ably more than others, possibly leaving no opportunity at all for turn contributions 
from other participants for much of the conversation. Moreover, in almost all of the 
gesture production studies reviewed here, one participant is assigned the role of the 
speaker who has all the information (i.e., who has seen the stimuli) and who tells it 
to their addressee. This sort of situation leads, for obvious reasons, mainly to conversa- 
tions dominated by one individual with a limited number of turns between speakers, 
even when these are completely free to interact. Considering the wide range of different 
forms of talk, it is important that our research reflects this spectrum, thus capturing 
human communication as the multi-faceted dimension that it is. At the same time, 
though, it is vital that researchers recognise the particular facet that individual datasets 
and analyses are representative of and be wary of over-generalisation. 

Alternatively, researchers may choose to explore free interaction as a default, and in 
contexts where differences in dialogic interaction could confound results, undertake 
steps to tackle these unwanted influences. For example, experimental groups could be 
compared for the number of turns used/the number of questions asked, and so on. If 
differences on these dimensions are found, statistical procedures that partial out the 
respective influences could be employed. This would allow researchers to carry out 
experimental studies in order to exert some degree of control over aspects such as con- 
tent of talk (narration/description of set stimuli) but without compromising spontane- 
ous social interaction and running the risk of unnecessary reductionism; only 
approaches using a social unit of analysis offer the opportunity to capture those pro- 
cesses that cannot be captured by simply “summing the parts” (cf. Bavelas 2005). Con- 
sidering we still know relatively little about gesture as a social behaviour and its use in 
dialogic interaction, experimental paradigms based on spontaneous, free interaction 
between non-confederates is certainly one main avenue researchers in this field need 
to pursue. 


6. Conclusion 


In this chapter, I have tried to provide an overview of the range of basic paradigms (and 
their variations) employed in experimental co-speech gesture research, combined with 
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some degree of critical reflection on aspects of these procedures. Due to the scope of 
this article, this overview remains selective and limited in many ways, but I hope to 
have been able to provide some starting point here, especially to scholars new in the 
field of co-speech gesture research. 

Ultimately, in choosing between different experimental methods and weighing up 
their pros and cons, it depends on the researcher’s exact aim to decide what is gained 
and lost by opting for a particular paradigm. With respect to the interpretation of 
research findings, it is important to recognise that differences in research results may 
in fact be rooted in differences between experimental paradigms used, even if these 
may seem small (such as regarding the degree of dialogic interaction). Further, it is 
important to be careful with the generalisation of findings and with distinguishing, 
for example, whether results tell us something about gesture use in dialogue or in 
more monologue-type contexts; or whether they tell us nothing about gesture use in 
interaction at all but useful things about the gesture-speech relationship nevertheless 
(i.e., something that Bavelas 2005 has referred to as studying the mind, or individuals’ 
thinking, as opposed to social interaction). 

In trying to make a choice between different paradigms in the light of their method- 
ological advantages and limitations, the most fruitful approach may still be one that 
combines less with more experimentally controlled methods (given there are good rea- 
sons for employing the latter). This way, we might throw light on the phenomena we 
aim to investigate from a range of different angles, capturing partly different aspects, 
and obtaining the most comprehensive answers. In my view, different experimental 
methods and techniques complement, similar to laboratory-based research of co-speech 
gesture complementing observations of gesture in non-experimental contexts — both in 
terms of the methods used and the questions answered. 
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Abstract 


For the scientific observation of non-verbal communication behavior, video recordings 
are the state of the art. However, everyone who has conducted at least one video-based 
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